Andrej Karpathy Flash News List

Time	Details
2025-12-28 00:04	Andrej Karpathy: Claude Code Probes Lutron Home IoT, Finds Controllers, Open Ports, and Firmware Metadata According to Andrej Karpathy, he directed Claude Code to test his Lutron home automation setup, and the agent located Lutron controllers on his local Wi‑Fi, scanned for open ports, connected, retrieved device metadata, and identified device firmware versions (source: Andrej Karpathy on X, Dec 28, 2025). Karpathy adds that the sequence demonstrated service enumeration and device identification across the LAN, showcasing autonomous network reconnaissance by an AI coding agent (source: Andrej Karpathy on X, Dec 28, 2025). He does not report any exploit execution or unauthorized configuration changes, indicating the activity was limited to discovery and information gathering (source: Andrej Karpathy on X, Dec 28, 2025). The post does not reference cryptocurrencies, tokens, or blockchain systems, so any crypto-market impact is not specified in the source (source: Andrej Karpathy on X, Dec 28, 2025). Source
2025-12-10 17:25	Andrej Karpathy Announces nanoGPT as First LLM to Train and Run Inference in Space — What Traders Should Know According to @karpathy, nanoGPT is the first large language model to both train and run inference in space, marking the start of the effort. According to @karpathy, the announcement confirms the initiative has begun but does not disclose technical specifications, mission details, partners, or a timeline. According to @karpathy, the post does not reference any cryptocurrencies, tokens, or market integrations, which limits immediate data-driven trading conclusions and frames this as a sentiment-driven headline for AI and compute narratives. Source
2025-12-10 17:15	Andrej Karpathy Benchmarks GPT-5.1 Thinking API on 930 Hacker News Threads: 3 Hours Build, 1 Hour Run, $60 Cost According to @karpathy, he used the GPT-5.1 Thinking API to auto-grade all 930 December 2015 Hacker News frontpage article-discussion pairs to identify the most and least prescient comments, taking about 3 hours to write the code and roughly 1 hour and $60 to run, source: twitter.com/karpathy/status/1998803709468487877 and karpathy.bearblog.dev/auto-grade-hn. According to @karpathy, the project repository is available at github.com/karpathy/hn-time-capsule and the full results are browsable at karpathy.ai/hncapsule, source: twitter.com/karpathy/status/1998803709468487877. According to @karpathy, he emphasized in-hindsight analysis as a practical way to train forward prediction models and noted that future LLMs will perform such work cheaper, faster, and better, source: twitter.com/karpathy/status/1998803709468487877. According to @karpathy, the top 10 most prescient HN accounts for that month were pcwalton, tptacek, paulmd, cstross, greglindahl, moxie, hannob, 0xcde4c3db, Manishearth, and johncolanduoni, source: twitter.com/karpathy/status/1998803709468487877. According to @karpathy, these run-time and cost figures provide a concrete real-world datapoint for large-scale LLM evaluation workflows using GPT-5.1 Thinking, anchored at approximately $60 for a 930-thread pass in about one hour, which traders tracking AI infrastructure efficiency can use as a benchmark, source: twitter.com/karpathy/status/1998803709468487877 and karpathy.bearblog.dev/auto-grade-hn. Source
2025-12-09 03:40	Python random.seed Sign Bug: seed(5) equals seed(-5) — Critical Risk for AI and Crypto Trading Backtests According to @karpathy, CPython’s random.seed ignores the sign of integer seeds, so seed(3) and seed(-3) produce identical RNG streams because the implementation takes the absolute value of PyLong arguments (source: twitter.com/karpathy/status/1998236299862659485; source: github.com/python/cpython/blob/main/Modules/_randommodule.c#L321). The Python docs state that if a is an int, it is used directly, and that the core generator is MT19937, but they only guarantee same seed => same sequence and do not promise distinct sequences for different seeds (source: docs.python.org/3/library/random.html). Karpathy reports this caused train=test leakage in his nanochat setup when he used seed sign to separate train/test splits, creating a serious reproducibility and overfitting risk (source: twitter.com/karpathy/status/1998236299862659485). For trading systems and crypto quants using Python for strategy simulation, Monte Carlo VaR, order routing randomness, or ML model evaluation, audit any pipelines that rely on sign-differentiated seeds or assume seed(n) != seed(-n) to avoid biased backtests and invalid performance metrics (source: twitter.com/karpathy/status/1998236299862659485). Actionable mitigations include avoiding negative-vs-positive seed conventions, using string or bytes seeds that are hashed via SHA-512 under version 2 seeding, or explicitly encoding the sign bit as 2*abs(n)+int(n<0) as noted by Karpathy (source: docs.python.org/3/library/random.html; source: twitter.com/karpathy/status/1998236299862659485). Source
2025-12-07 18:13	Andrej Karpathy says LLMs are simulators, not agents for crypto trading research on BTC and ETH According to Andrej Karpathy, large language models should be treated as simulators that channel multiple perspectives rather than as entities with their own opinions, source: Andrej Karpathy on X. He advises replacing you centric questions with prompts that ask what different groups would say, which is directly applicable to structuring crypto market research, source: Andrej Karpathy on X. Applying this to trading workflows, practitioners can prompt simulated bulls, bears, and market makers to generate scenario narratives for BTC and ETH without assuming the model holds a personal view, source: Andrej Karpathy on X. He adds that forcing a you voice only makes the model adopt a personality implied by finetuning data statistics, reinforcing role based simulation as the correct mental model for AI assisted analysis, source: Andrej Karpathy on X. Source
2025-11-24 17:35	Andrej Karpathy’s Definitive View: AI Homework Detection Is Impossible — What Traders Should Know Now According to @karpathy on X on Nov 24, 2025, AI use in homework cannot be detected and current AI detectors do not work, underscoring inevitable adoption of generative AI in schools (Source: @karpathy on X, Nov 24, 2025). According to @karpathy on X on Nov 24, 2025, he briefed a school board and shared highlights urging adaptation to AI in education rather than reliance on detection tools (Source: @karpathy on X, Nov 24, 2025). According to @karpathy on X on Nov 24, 2025, the post contains no references to cryptocurrencies or trading, indicating no stated direct crypto market impact (Source: @karpathy on X, Nov 24, 2025). Source
2025-11-23 18:03	Andrej Karpathy Demo: Gemini Nano Banana Pro Solves Exam Image Questions in Real-World Test; Traders Watch GOOGL and AI Tokens RNDR, FET According to @karpathy, Gemini Nano Banana Pro solved chemistry exam questions directly from an image of the exam page, correctly parsing doodles and diagrams, with ChatGPT later judging the answers correct except for a nomenclature fix on Se2P2 and a spelling correction for thiocyanic acid, source: Andrej Karpathy on X, Nov 23, 2025. The demo evidences in-image multimodal parsing and reasoning on dense document layouts, which aligns with Google’s Gemini family positioning and the inclusion of Nano in the product lineup, source: Andrej Karpathy on X, Nov 23, 2025; Google DeepMind Gemini introduction, Dec 2023. Historically, prominent AI capability reveals have coincided with rotations into AI-linked crypto assets such as RNDR and FET and related equities after major AI news, source: Reuters reporting on AI token rallies during the ChatGPT surge in Feb 2023 and after Nvidia earnings in May 2024. Traders may watch Alphabet GOOGL and AI infrastructure tokens for narrative momentum if this demo draws broader attention, while noting the accuracy risk highlighted by the Se2P2 naming and spelling errors, source: Andrej Karpathy on X, Nov 23, 2025; Reuters Feb 2023 and May 2024. Source
2025-11-22 23:54	Andrej Karpathy unveils llm-council open-source multi-LLM ensemble via OpenRouter; GPT-5.1 ranked highest by peers, Claude lowest According to @karpathy, he released an open-source llm-council web app that dispatches each user query to multiple models via OpenRouter, lets models review and rank anonymized responses, and then a Chairman LLM produces the final answer, detailing a concrete multi-LLM ensemble workflow. Source: @karpathy on X. According to @karpathy, the current council includes openai/gpt-5.1, google/gemini-3-pro-preview, anthropic/claude-sonnet-4.5, and x-ai/grok-4, providing side-by-side outputs and rankings across OpenAI, Google, Anthropic, and xAI model families. Source: @karpathy on X. According to @karpathy, cross-model evaluation frequently selects another model’s response as superior, highlighting a practical peer-review method for model selection and ranking. Source: @karpathy on X. According to @karpathy, in his reading tests the models consistently praised GPT-5.1 as the best and most insightful and consistently selected Claude as the worst, with Gemini 3 Pro and Grok-4 in between, while his qualitative take found GPT-5.1 wordy, Gemini 3 more condensed, and Claude too terse. Source: @karpathy on X. According to @karpathy, the code is publicly available for others to try on GitHub under the llm-council repository. Source: @karpathy on X and @karpathy on GitHub. According to @karpathy, the post does not mention cryptocurrencies, tokens, or blockchains, and provides no direct crypto market claims. Source: @karpathy on X. Source
2025-11-22 02:11	Andrej Karpathy seeks quantitative definition of AI 'slop' and a measurable 'slop index' using LLM miniseries and thinking token budgets for evaluation According to @karpathy, he is seeking a quantitative, measurable definition of AI 'slop' and notes he has an intuitive 'slop index' but lacks a formal metric. Source: @karpathy on X, Nov 22, 2025. According to @karpathy, potential approaches he is considering include using LLM miniseries and analyzing thinking token budgets to quantify output quality and cost. Source: @karpathy on X, Nov 22, 2025. For traders in AI and crypto-adjacent markets, this post highlights an active gap in standardized LLM quality metrics that directly ties to model evaluation and cost controls, which are key inputs for pricing and benchmarking AI products. Source: @karpathy on X, Nov 22, 2025. Source
2025-11-21 16:43	Andrej Karpathy on AI Intelligence Diversity: No Direct Crypto Trading Catalyst for Markets According to @karpathy, the space of intelligences is large and animal intelligence is only a single point arising from a specific optimization process fundamentally distinct from that of artificial systems. Source: @karpathy on X, Nov 21, 2025. The post is conceptual and provides no product announcements, model releases, datasets, performance metrics, timelines, or any crypto asset or token mentions, indicating no direct trading catalyst for crypto or equities. Source: @karpathy on X, Nov 21, 2025. For crypto market context, this statement aligns with the broader AI agents and autonomous intelligence narrative, but the source offers no on-chain, protocol, or market data. Source: @karpathy on X, Nov 21, 2025. Source
2025-11-18 00:29	Andrej Karpathy details 3-pass LLM reading workflow and shift toward writing for LLMs According to @karpathy, he now reads blogs, articles, and book chapters using a three-pass LLM workflow: pass 1 manual reading, pass 2 explain and summarize, and pass 3 Q&A, which he says yields a deeper understanding than moving on, source: @karpathy on X, Nov 18, 2025. He adds that this habit is growing into one of his top LLM use cases, source: @karpathy on X, Nov 18, 2025. He also states that writers may increasingly write for an LLM so the model first internalizes the idea and then targets, personalizes, and serves it to users, source: @karpathy on X, Nov 18, 2025. The post does not mention cryptocurrencies or trading signals, indicating any crypto market relevance would be indirect via LLM usage patterns in content consumption and personalization, source: @karpathy on X, Nov 18, 2025. Source
2025-11-17 18:56	Crypto Trading Discipline: Andrej Karpathy Urges Principles Over Galaxy Brain Rationalization with 2 Actionable Strategies for Volatile Markets According to @karpathy, traders should prioritize rule-based principles and avoid post-hoc galaxy brain justifications, citing two actionable strategies: have principles and hold the right bags, financially and socially; source: @karpathy on X, Nov 17, 2025; x.com/VitalikButerin/status/1986906940472238108. According to @karpathy, applying constraint-based rules akin to simple guardrails is preferable to flexible utility calculus, reinforcing disciplined entries, position sizing, and clear no-trade conditions during volatility; source: @karpathy on X, Nov 17, 2025. According to @karpathy, aligning positions with long-term conviction and social capital helps avoid rotating into narratives you cannot defend under stress, supporting consistent execution in crypto markets; source: @karpathy on X, Nov 17, 2025. Source
2025-10-26 16:24	PyTorch MPS addcmul_ Silent-Failure Bug on Non-Contiguous Tensors Flags AI Training Risk: What Traders Should Watch According to @karpathy, a detailed debugging investigation traced a suspicious training loss curve to a PyTorch MPS backend issue where addcmul_ silently fails on non-contiguous output tensors in the Objective-C++ path, pointing to a correctness bug that does not throw errors during training; Source: @karpathy on X https://twitter.com/karpathy/status/1982483540899237981 and the referenced thread by @ElanaPearl https://x.com/ElanaPearl/status/1981389648695025849. For AI workflow reliability, this implies Mac Apple MPS-based training can yield incorrect results without explicit runtime alerts, directly impacting the integrity of model training and evaluation pipelines used by practitioners; Source: @karpathy on X https://twitter.com/karpathy/status/1982483540899237981 and @ElanaPearl on X https://x.com/ElanaPearl/status/1981389648695025849. For traders, treat this as a software reliability risk flag within the AI toolchain and monitor official PyTorch or Apple MPS updates and release notes that reference addcmul_ or non-contiguous tensor handling, as confirmed fixes would reduce operational uncertainty around AI workloads that markets track for sentiment; Source: @karpathy on X https://twitter.com/karpathy/status/1982483540899237981 and @ElanaPearl on X https://x.com/ElanaPearl/status/1981389648695025849. Source
2025-10-21 15:59	Andrej Karpathy Unveils nanochat d32: $800 Synthetic-Data Custom LLM Identity and Script Release, Key Signals for AI Agent Builders According to @karpathy, nanochat now carries a defined identity and can state its capabilities, including that it is nanochat d32 built by him with a reported $800 cost and weaker non-English proficiency, achieved via synthetic data generation, source: x.com/karpathy/status/1980508380860150038. He released an example script that demonstrates generating diverse synthetic conversations and mixing them into mid-training or SFT, stressing the importance of entropy to avoid repetitive datasets, source: x.com/karpathy/status/1980508380860150038. He adds that base LLMs lack inherent personality or self-knowledge and require explicitly bolted-on traits via curated synthetic data, source: x.com/karpathy/status/1980508380860150038. For traders, the disclosed $800 customization benchmark and open-source workflow provide concrete cost and process reference points for evaluating open-source AI agent development and adoption paths across AI-linked tokens and AI-exposed equities, source: twitter.com/karpathy/status/1980665134415802554. Source
2025-10-20 22:13	Andrej Karpathy: DeepSeek-OCR Signals 4 Reasons Pixels May Beat Text Tokens for LLM Inputs — Efficiency, Shorter Context Windows, Bidirectional Attention, No Tokenizer According to Andrej Karpathy, the DeepSeek-OCR paper is a strong OCR model and more importantly highlights why pixels might be superior to text tokens as inputs to large language models, emphasizing model efficiency and input fidelity, source: Andrej Karpathy on X, Oct 20, 2025. He states that rendering text to images and feeding pixels can deliver greater information compression, enabling shorter context windows and higher efficiency, source: Andrej Karpathy on X, Oct 20, 2025. He adds that pixel inputs provide a more general information stream that preserves formatting such as bold and color and allows arbitrary images alongside text, source: Andrej Karpathy on X, Oct 20, 2025. He argues that image inputs enable bidirectional attention by default instead of autoregressive attention at the input stage, which he characterizes as more powerful for processing, source: Andrej Karpathy on X, Oct 20, 2025. He advocates removing the tokenizer at input due to the complexity and risks of Unicode and byte encodings, including security or jailbreak issues such as continuation bytes and semantic mismatches for emojis, source: Andrej Karpathy on X, Oct 20, 2025. He frames OCR as one of many vision-to-text tasks and suggests many text-to-text tasks can be reframed as vision-to-text, while the reverse is not generally true, source: Andrej Karpathy on X, Oct 20, 2025. He proposes a practical setup where user messages are images while the assistant response remains text and notes outputting pixels is less obvious, and he mentions an urge to build an image-input-only version of nanochat while referencing the vLLM project, source: Andrej Karpathy on X, Oct 20, 2025. Source
2025-10-16 00:14	Karpathy Unveils $1,000 nanochat d32: 33-Hour Train, CORE 0.31, GSM8K 20% — Watch AI Compute Tokens RNDR, AKT, TAO According to @karpathy, the depth-32 nanochat d32 trained for about 33 hours at roughly $1,000 and showed consistent metric gains across pretraining, SFT, and RL (Source: Karpathy on X; Karpathy GitHub nanochat discussion). He reports a CORE score of 0.31 versus GPT-2 at about 0.26 and GSM8K improvement from around 8% to about 20%, indicating a notable uplift for a micro model (Source: Karpathy on X; Karpathy GitHub nanochat discussion). He cautions that nanochat costs $100–$1,000 to train and the $100 version is about 1/1000th the size of GPT-3, leading to frequent hallucinations and limited reliability compared to frontier LLMs, so user expectations should remain modest (Source: Karpathy on X). He adds that scripts including run1000 sh are available in the repo, he is temporarily hosting the model for testing, and he plans throughput tuning before possibly scaling to a larger tier (Source: Karpathy on X; Karpathy GitHub repository). For traders, decentralized GPU networks that market AI workload support such as Render (RNDR), Akash (AKT), and Bittensor (TAO) remain key watchlist names as open-source, low-cost training expands developer experimentation (Source: Render Network documentation; Akash Network documentation; Bittensor documentation). Source
2025-10-13 15:16	Andrej Karpathy Releases nanochat: Train a ChatGPT-Style LLM in 4 Hours for about $100 on 8x H100, Setting Clear GPU Cost Benchmarks for Traders According to @karpathy, nanochat is a minimal from-scratch full-stack pipeline that lets users train and serve a simple ChatGPT-like LLM via a single script on a cloud GPU and converse with it in a web UI in about 4 hours, enabling an end-to-end training and inference workflow. source: @karpathy. He specifies the codebase has about 8,000 lines and includes tokenizer training in Rust, pretraining on FineWeb with CORE evaluation, midtraining on SmolTalk and multiple-choice data with tool use, supervised fine-tuning, optional RL on GSM8K via GRPO, and an inference engine with KV cache, Python tool use, CLI, a ChatGPT-like web UI, plus an auto report card. source: @karpathy. Disclosed cost and timing benchmarks are about $100 for roughly 4 hours on an 8x H100 node and about $1000 for about 41.6 hours, with a 24-hour depth-30 run reaching MMLU in the 40s, ARC-Easy in the 70s, and GSM8K in the 20s. source: @karpathy. From these figures, the implied compute rate is roughly $3.1 per H100-hour (about $100 across 32 H100-hours) and about $3.0 per H100-hour at the longer run (about $1000 across 332.8 H100-hours), providing concrete GPU-hour cost benchmarks for trading models of AI training spend. source: @karpathy. He also notes that around 12 hours surpasses GPT-2 on the CORE metric and that capability improves with more training, positioning nanochat as a transparent strong-baseline stack and the capstone for LLM101n with potential as a research harness. source: @karpathy. For crypto market participants tracking AI infrastructure, these cost-performance disclosures offer reference points to assess demand for centralized cloud and decentralized GPU compute tied to open-source LLM training workflows. source: @karpathy. Source
2025-10-09 00:10	Andrej Karpathy flags RLHF flaw: LLMs fear exceptions and calls for reward redesign in RL training According to Andrej Karpathy, current reinforcement learning practices make LLMs mortally terrified of exceptions, and he argues exceptions are a normal part of a healthy development process, as stated on Twitter on Oct 9, 2025. Karpathy urged the community to sign his LLM welfare petition to improve rewards in cases of exceptions, as stated on Twitter on Oct 9, 2025. The post includes no references to cryptocurrencies, tokens, or market data, indicating no direct market update from the source, as stated on Twitter on Oct 9, 2025. Source
2025-10-03 13:37	Karpathy: LLM Agent Coding Not Ready for Half of Professional Work Despite ~50% ‘Mostly Agent’ Poll Signal According to Andrej Karpathy, an X poll he referenced showed roughly half of respondents reporting they mostly use agent‑mode coding, contrary to his expectation of 50 percent tab‑complete, 30 percent manual, 20 percent agent, source: Andrej Karpathy on X, Oct 3, 2025, https://x.com/karpathy/status/1974106507034964111; poll link https://x.com/karpathy/status/1973892769359056997. He states his own workflow is primarily tab completion and he turns it off when not useful, using agents mainly for boilerplate or unfamiliar stacks with substantial review and edits, source: Andrej Karpathy on X, Oct 3, 2025, https://x.com/karpathy/status/1974106507034964111. He warns that when tasks are deep, tangled, or off the data manifold, LLMs produce bloated code with subtle bugs, concluding agent mode is not ready to write about half of professional code, source: Andrej Karpathy on X, Oct 3, 2025, https://x.com/karpathy/status/1974106507034964111. He asked for a serious organization to rerun the poll, underscoring uncertainty around actual adoption rates, source: Andrej Karpathy on X, Oct 3, 2025, https://x.com/karpathy/status/1974106507034964111. There was no mention of cryptocurrencies or blockchain in his comments, source: Andrej Karpathy on X, Oct 3, 2025, https://x.com/karpathy/status/1974106507034964111. Source
2025-10-01 19:22	Andrej Karpathy: Tinker Cuts LLM Post-Training Complexity to Under 10% and Keeps 90% Algorithmic Control for Faster Finetuning According to @karpathy, Tinker allows researchers and developers to retain roughly 90% of algorithmic creative control over data, loss functions, and training algorithms while offloading infrastructure, forward and backward passes, and distributed training to the framework. Source: @karpathy on X, Oct 1, 2025, https://twitter.com/karpathy/status/1973468610917179630 According to @karpathy, Tinker reduces the typical complexity of LLM post-training to well below 10%, positioning it as a lower-friction alternative to common “upload your data, we’ll train your LLM” services. Source: @karpathy on X, Oct 1, 2025, https://twitter.com/karpathy/status/1973468610917179630 According to @karpathy, this “slice” of the post-training workflow both delegates heavy lifting and preserves majority control of data and algorithmic choices, which he views as a more effective trade-off for practitioners. Source: @karpathy on X, Oct 1, 2025, https://twitter.com/karpathy/status/1973468610917179630 According to @karpathy, finetuning is less about stylistic changes and more about narrowing task scope, where fine-tuned smaller LLMs can outperform and run faster than large models prompted with giant few-shot prompts when ample training examples exist. Source: @karpathy on X, Oct 1, 2025, https://twitter.com/karpathy/status/1973468610917179630 According to @karpathy, production LLM applications are increasingly DAG-based pipelines where some steps remain prompt-driven while many components work better as fine-tuned models, and Tinker makes these finetunes trivial for rapid experimentation. Source: @karpathy on X, Oct 1, 2025, https://twitter.com/karpathy/status/1973468610917179630; supporting reference: Thinky Machines post, https://x.com/thinkymachines/status/1973447428977336578 Source

2025-12-28
00:04

Andrej Karpathy: Claude Code Probes Lutron Home IoT, Finds Controllers, Open Ports, and Firmware Metadata

According to Andrej Karpathy, he directed Claude Code to test his Lutron home automation setup, and the agent located Lutron controllers on his local Wi‑Fi, scanned for open ports, connected, retrieved device metadata, and identified device firmware versions (source: Andrej Karpathy on X, Dec 28, 2025). Karpathy adds that the sequence demonstrated service enumeration and device identification across the LAN, showcasing autonomous network reconnaissance by an AI coding agent (source: Andrej Karpathy on X, Dec 28, 2025). He does not report any exploit execution or unauthorized configuration changes, indicating the activity was limited to discovery and information gathering (source: Andrej Karpathy on X, Dec 28, 2025). The post does not reference cryptocurrencies, tokens, or blockchain systems, so any crypto-market impact is not specified in the source (source: Andrej Karpathy on X, Dec 28, 2025).

List of Flash News about Andrej Karpathy